Navigation

  • index
  • next |
  • previous |
  • PyHowTo documentation »
  • Web »

Table of Contents

Python v3.7 HowTos:

  • ----------------
  • Recursion
  • Backtracking
  • Dynamic Programming
  • Greedy
  • Sort
  • Binary Search
  • Depth First Search [DFS]
  • Breadth First Search [BFS]
  • Binary Search Tree [BST]
  • ----------------
  • Array
  • String
  • Heap
  • Stack
  • Queue
  • Tree
  • Linked List
  • Hash Table
  • Bit Manipulation
  • Two Pointers
  • Math
  • Decorator
  • ----------------
  • Basic
  • Intermediate
  • Advanced
  • Interview
  • ----------------
  • Spark
  • Tkinter
  • Turtle
  • Games
  • Web
  • ----------------
  • About
  • History

Previous topic

Extract all the header tags from wiki main_page

Next topic

Get broken down visits by browser on data.gov

Quick search

Extract all image links from wikiΒΆ

Extract and display all the image links from en.wikipedia.org/wiki/peter_jeffrey_(raaf_officer).
from urllib.request import urlopen
from bs4 import BeautifulSoup
import re

html = urlopen('https://en.wikipedia.org/wiki/Peter_Jeffrey_(RAAF_officer)')

bs = BeautifulSoup(html, 'html.parser')

images = bs.find_all('img', {'src':re.compile('.jpg')})

for image in images:
    print(image['src']+'\n')

Output:

//upload.wikimedia.org/wikipedia/commons/thumb/a/af/NlaJeffrey1942-43.jpg/220px-NlaJeffrey1942-43.jpg
//upload.wikimedia.org/wikipedia/commons/thumb/c/c5/008315JeffreyTurnbull1941.jpg/260px-008315JeffreyTurnbull1941.jpg
//upload.wikimedia.org/wikipedia/commons/e/ea/021807CameronJeffrey1941.jpg
//upload.wikimedia.org/wikipedia/commons/thumb/9/92/AC0072JeffreyTruscottKittyhawks1942.jpg/280px-AC0072JeffreyTruscottKittyhawks1942.jpg
//upload.wikimedia.org/wikipedia/commons/thumb/2/26/VIC1689Jeffrey1945.jpg/280px-VIC1689Jeffrey1945.jpg

See also

https://www.w3resource.com/python-exercises/web-scraping/web-scraping-exercise-8.php

Navigation

  • index
  • next |
  • previous |
  • PyHowTo documentation »
  • Web »
© Copyright 2020, Sergiy Zaytsev, szaytsev@hotmail.com. Created using Sphinx 2.3.0.